Data normalization
Initialization
We start the analysis by initializing the packages required for all the analysis performed in this section. We also define the root directory, within which all the input/output operations for this project will be performed. At the end of this document, a detailed software version information is provided for easier reproducibility of the analysis.
library(DT)
library(tidyverse)
library(data.table)
library(WriteXLS)
library(ggrepel)
library(ggpubr)
library(patchwork)
library(pheatmap)
library(RColorBrewer)
path = "/Users/ashwin/Documents/Projects/YeastScreen/Nonessential_MATa_screen/"Normalization methodology
We use two different normalization strategy to answer specific questions -
- For a given plate \(i\), the Cytoplasmic and Mitochondrial roGFP2 ratios are normalized by the median of all controls in the same plate i.e for cytoplasm - \[NormCyto(i) = \frac{Cyto_i} {median(Ctrl_i)}\] and for mitochondria, \[NormMito(i) = \frac{Mito_i} {median(Ctrl_i)}\]
This type of control based plate normalization will be suitable in comparing the redox levels in across organells and nutrients since the roGFP2 ratios has been commonly normalized to the plate control.
- For a given plate \(i\) with the Cytoplasmic and Mitochondrial roGFP2 ratios \(j\) are normalized by their respective plate specific median roGFP2 ratios i.e for cytoplasm - \[NormCyto_{i} = \frac{Cyto_{ij} - median(Cyto_{i})} {mad(Cyto_{i})}\] and for mitochondria, \[NormMito_{i} = \frac{Mito_{ij} - median(Mito_{i})} {mad(Mito_{i})}\]
This type of organelle specific normalization will be suitable in comparing the redox levels in each organelle across all plates. Since this type of normalization tells us how far are the values from the plate median.
Furthermore, for each normalization strategy, we also summarized the quadruplicated values per gene by taking the median of the scaled values. Due to our outlier removal strategy in the previous section, many NA values were introduced. Each mutant has 4 values, we removed mutants with 2 or more NA values and those with only one NA value was substituted with the median of the other 3 observed values. Next, there were also multiple copies of the same mutants (genes) in either the same plate or different plates. We summarized them by taking the set of quadruplicated or median summarized values which have the maximum absolute median value.
Eventually, we generate the following -
- A table with all the replicate values per organelle and nutrient conditions and the the median summarized value from all plates
- A matrix with gene/mutants in rows and all combinations of organelle, nutrient and replicates (and median summarized value) in the column
Below is the table of cleaned raw data that we will use in our analysis.
rawDatCleaned = readRDS(paste0(path, "data/workspaces/YeastRedox_MATa_RawData_ForZNorm.RDS"))
rawDatCleaned = rawDatCleaned[, c("Plate", "Well_96", "Well", "Group", "Content",
"SystematicName", "SGD.ID", "Gene.Symbol", "roGFP2.ratio", "Type")]
rawDatCleaned_ctrlfilt = readRDS(paste0(path, "data/workspaces/YeastRedox_MATa_RawData_ForCtrlNorm.RDS"))
rawDatCleaned_ctrlfilt = rawDatCleaned_ctrlfilt[, c("Plate", "Well_96", "Well", "Group",
"Content", "SystematicName", "SGD.ID", "Gene.Symbol", "roGFP2.ratio", "Type")]
normalizeScreenData = function(normMethod) {
if (normMethod == "fracControl") {
normDat = split(rawDatCleaned_ctrlfilt, as.character(rawDatCleaned_ctrlfilt$Type))
normDat = lapply(normDat, function(x) {
split(x, x$Plate)
})
}
if (normMethod == "robustZ") {
normDat = split(rawDatCleaned, rawDatCleaned$Type)
normDat = lapply(normDat, function(x) {
split(x, x$Plate)
})
}
normDatReps = normDat
for (i in 1:length(normDat)) {
for (j in 1:length(normDat[[i]])) {
tmp = droplevels(normDat[[i]][[j]])
if (nrow(tmp) > 0) {
#---------------------------------------------------------------------------------------------
# Separate Controls, Cytoplasam and Mitochondrial roGFP2 ratios per plate
#---------------------------------------------------------------------------------------------
# Control
tmp.ctrl = droplevels(tmp[tmp$Content == "Control", ])
# Cytoplasm
tmp.cyto = droplevels(tmp[tmp$Content == "Cytoplasm", ])
# Mitochondria
tmp.mito = droplevels(tmp[tmp$Content == "Mitochondria", ])
rm(tmp)
#---------------------------------------------------------------------------------------------
# Compute the corresponding normalization factors (median & median absolute
# deviation)
#---------------------------------------------------------------------------------------------
# Normalizing factor - Control
nf.ctrl.med = median(tmp.ctrl$roGFP2.ratio, na.rm = T)
# Normalizing factor - Cytoplasm
nf.cyto.med = median(tmp.cyto$roGFP2.ratio, na.rm = T)
nf.cyto.mad = mad(tmp.cyto$roGFP2.ratio, na.rm = T)
# Normalizing factor - Mitochondria
nf.mito.med = median(tmp.mito$roGFP2.ratio, na.rm = T)
nf.mito.mad = mad(tmp.mito$roGFP2.ratio, na.rm = T)
#---------------------------------------------------------------------------------------------
# The two normalization strategies (plate control and median based)
#---------------------------------------------------------------------------------------------
# Normalization 1 - Plate control based
if (normMethod == "fracControl") {
tmp.cyto$roGFP2.ratio = tmp.cyto$roGFP2.ratio/nf.ctrl.med
tmp.mito$roGFP2.ratio = tmp.mito$roGFP2.ratio/nf.ctrl.med
}
# Normalization 2 - Plate median based
if (normMethod == "robustZ") {
tmp.cyto$roGFP2.ratio = (tmp.cyto$roGFP2.ratio - nf.cyto.med)/nf.cyto.mad
tmp.mito$roGFP2.ratio = (tmp.mito$roGFP2.ratio - nf.mito.med)/nf.mito.mad
}
rm(nf.ctrl.med, nf.cyto.med, nf.cyto.mad, nf.mito.med, nf.mito.mad)
#---------------------------------------------------------------------------------------------
# Normalized data with replicates - coverting from long to wide table format
#---------------------------------------------------------------------------------------------
#---Cytoplasm---#
tmp.cyto.repl = tmp.cyto %>% select(Gene.Symbol, Plate, Group, roGFP2.ratio,
Type, Content) %>% group_by(Gene.Symbol, Group) %>% mutate(pseurep = paste0("roGFP2_ratio_",
1:n())) %>% spread(key = pseurep, value = roGFP2.ratio) %>% ungroup() %>%
select(-Group) %>% rename(Genes = Gene.Symbol, Nutrient = Type,
Organelle = Content)
#---Mitochondria---#
tmp.mito.repl = tmp.mito %>% select(Gene.Symbol, Plate, Group, roGFP2.ratio,
Type, Content) %>% group_by(Gene.Symbol, Group) %>% mutate(pseurep = paste0("roGFP2_ratio_",
1:n())) %>% spread(key = pseurep, value = roGFP2.ratio) %>% ungroup() %>%
select(-Group) %>% rename(Genes = Gene.Symbol, Nutrient = Type,
Organelle = Content)
#--Compilation---#
if (identical(ncol(tmp.cyto.repl), ncol(tmp.mito.repl))) {
normDatReps[[i]][[j]] = rbind(tmp.cyto.repl, tmp.mito.repl)
} else {
normDatReps[[i]][[j]] = NA
}
# Deleting
rm(tmp.ctrl, tmp.cyto, tmp.mito, tmp.cyto.repl, tmp.mito.repl)
}
}
rm(j)
}
rm(i)
res = lapply(normDatReps, function(x) {
x = x[!is.na(x)]
x = do.call("rbind", x)
x = as.data.frame(x)
x$Plate = factor(x$Plate)
x$Organelle = factor(x$Organelle, levels = c("Mitochondria", "Cytoplasm"))
x$Nutrient = factor(x$Nutrient, levels = c("Glucose", "Galactose", "Glycerol"))
rownames(x) = 1:nrow(x)
return(x)
})
res = do.call("rbind", res)
#-------------------------------------------------------------------------------------------------
# Summarizing mutiple mutants (i.e same gene) from the same or different plates
# Also dropping genes with 2 or more NA values For genes with just 1 NA value,
# replacing it with the median of the remaining 3 observed values
#-------------------------------------------------------------------------------------------------
res = res %>% rowwise() %>% mutate(Median_roGFP2_ratio = median(c(roGFP2_ratio_1,
roGFP2_ratio_2, roGFP2_ratio_3, roGFP2_ratio_4), na.rm = T), NA_per_row = sum(is.na(c(roGFP2_ratio_1,
roGFP2_ratio_2, roGFP2_ratio_3, roGFP2_ratio_4)))) %>% filter(NA_per_row <
2) %>% ungroup() %>% mutate_at(vars(starts_with("roGFO2_ratio_")), function(x) ifelse(is.na(x),
.$Median_roGFP2_ratio, x)) %>% select(-NA_per_row) %>% group_by(Organelle,
Nutrient, Genes) %>% top_n(n = 1, abs(Median_roGFP2_ratio)) %>% ungroup()
#-------------------------------------------------------------------------------------------------
# Compiling all replicates across organelles and nutrient conditions into a
# single matrix
#-------------------------------------------------------------------------------------------------
redoxMat = data.table(res[, c("Genes", "Nutrient", "Organelle", "roGFP2_ratio_1",
"roGFP2_ratio_2", "roGFP2_ratio_3", "roGFP2_ratio_4")])
redoxMat = dcast(redoxMat, Genes ~ Organelle + Nutrient, fun.aggregate = function(x) {
x
}, fill = NA, value.var = c("roGFP2_ratio_1", "roGFP2_ratio_2", "roGFP2_ratio_3",
"roGFP2_ratio_4"))
redoxMat = redoxMat[, c(1, 2, 8, 14, 20, 3, 9, 15, 21, 4, 10, 16, 22, 5, 11,
17, 23, 6, 12, 18, 24, 7, 13, 19, 25)]
redoxMat = data.frame(redoxMat, row.names = 1, stringsAsFactors = F)
#-------------------------------------------------------------------------------------------------
# Compiling the median values across organelles and nutrient conditions into a
# single matrix
#-------------------------------------------------------------------------------------------------
redoxMat_median = data.table(res[, c("Genes", "Nutrient", "Organelle", "Median_roGFP2_ratio")])
redoxMat_median = dcast(redoxMat_median, Genes ~ Organelle + Nutrient, fun.aggregate = function(x) {
x
}, fill = NA, value.var = "Median_roGFP2_ratio")
redoxMat_median = data.frame(redoxMat_median, row.names = 1, stringsAsFactors = F)
#-------------------------------------------------------------------------------------------------
# Putting all the results into one
#-------------------------------------------------------------------------------------------------
res = list(redox_table = res, redox_replicates = redoxMat, redox_median = redoxMat_median)
rm(normDat, normDatReps, redoxMat, redoxMat_median)
return(res)
}
normDat.ctrNorm = normalizeScreenData(normMethod = "fracControl")
normDat.Znorm = normalizeScreenData(normMethod = "robustZ")Normalized data overview
We normalize the data as described above, below is the summary and the distribution of the normalized data -
- For per plate normalization based on plate specific control
Genes Plate Nutrient Organelle
Length:20194 104 : 536 Glucose :6832 Mitochondria:10980
Class :character 102 : 530 Galactose:6813 Cytoplasm : 9214
Mode :character 107 : 524 Glycerol :6549
142 : 502
116 : 476
101 : 467
(Other):17159
roGFP2_ratio_1 roGFP2_ratio_2 roGFP2_ratio_3 roGFP2_ratio_4
Min. :0.0001 Min. :0.0001 Min. :0.0001 Min. :0.0001
1st Qu.:1.2289 1st Qu.:1.2253 1st Qu.:1.2178 1st Qu.:1.2198
Median :1.5502 Median :1.5436 Median :1.5345 Median :1.5437
Mean :1.5739 Mean :1.5703 Mean :1.5647 Mean :1.5654
3rd Qu.:1.8616 3rd Qu.:1.8586 3rd Qu.:1.8481 3rd Qu.:1.8485
Max. :5.0074 Max. :5.7204 Max. :4.6066 Max. :5.0642
NA's :1285
Median_roGFP2_ratio
Min. :0.0001
1st Qu.:1.2243
Median :1.5486
Mean :1.5690
3rd Qu.:1.8518
Max. :4.5752
- For per plate normalization based on median organelle specific values
Genes Plate Nutrient Organelle
Length:20767 104 : 546 Glucose :7057 Mitochondria:11261
Class :character 102 : 532 Galactose:6967 Cytoplasm : 9506
Mode :character 107 : 524 Glycerol :6743
142 : 499
116 : 483
105 : 477
(Other):17706
roGFP2_ratio_1 roGFP2_ratio_2 roGFP2_ratio_3 roGFP2_ratio_4
Min. :-13.129041 Min. :-14.12713 Min. :-10.55667 Min. :-8.8123
1st Qu.: -0.669767 1st Qu.: -0.68972 1st Qu.: -0.72973 1st Qu.:-0.7219
Median : 0.005724 Median : -0.00748 Median : -0.05430 Median :-0.0490
Mean : 0.042939 Mean : 0.02005 Mean : -0.01189 Mean :-0.0141
3rd Qu.: 0.674491 3rd Qu.: 0.67449 3rd Qu.: 0.63977 3rd Qu.: 0.6186
Max. : 27.784782 Max. : 27.77066 Max. : 28.68175 Max. :27.7560
NA's :1325
Median_roGFP2_ratio
Min. :-8.44525
1st Qu.:-0.58048
Median :-0.04568
Mean : 0.01706
3rd Qu.: 0.54324
Max. :27.77772
- Number of mutants per condition
a = normDat.ctrNorm$redox_table
a = split(a$Genes, paste0(a$Nutrient, a$Organelle))
sapply(a, function(x) length(unique(x))) GalactoseCytoplasm GalactoseMitochondria GlucoseCytoplasm
3074 3739 3300
GlucoseMitochondria GlycerolCytoplasm GlycerolMitochondria
3532 2840 3709
- Similarity between the normalization strategies
plotList = vector("list", 6)
names(plotList) = c("Glucose-Cytoplasm", "Glucose-Mitochondria",
"Galactose-Cytoplasm", "Galactose-Mitochondria",
"Glycerol-Cytoplasm", "Glycerol-Mitochondria")
plotListTop = plotList
for(i in c("Glucose", "Galactose", "Glycerol"))
{
for(j in c("Cytoplasm", "Mitochondria"))
{
a = normDat.Znorm$redox_table[which(normDat.Znorm$redox_table$Nutrient == i & normDat.Znorm$redox_table$Organelle == j),]
a = a[, c("Genes", "Median_roGFP2_ratio")]
colnames(a) = c("Genes", "Znorm")
b = normDat.ctrNorm$redox_table[which(normDat.ctrNorm$redox_table$Nutrient == i & normDat.ctrNorm$redox_table$Organelle == j),]
b = b[, c("Genes", "Median_roGFP2_ratio")]
colnames(b) = c("Genes", "Ctrlnorm")
df = merge(a,b)
topZ = quantile(df$Znorm, probs = c(0.05, 0.95))
topC = quantile(df$Ctrlnorm, probs = c(0.05, 0.95))
df_top = df[which(df$Znorm < topZ[1] | df$Znorm > topZ[2] | df$Ctrlnorm < topC[1] | df$Ctrlnorm > topC[2]),]
id = paste(i,j,sep="-")
plotList[[id]] = ggscatter(df, x = "Znorm", y = "Ctrlnorm",
color = "black", shape = 20, size = 0.5, # Points color, shape and size
add = "reg.line", # Add regressin line
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
conf.int = TRUE, # Add confidence interval
cor.coef = TRUE, # Add correlation coefficient. see ?stat_cor
cor.coeff.args = list(method = "spearman", label.x = -10, label.y = 3, label.sep = "\n"),
ggtheme = theme_classic(base_size = 8)) + labs(subtitle = id)
plotListTop[[id]] = ggscatter(df_top, x = "Znorm", y = "Ctrlnorm",
color = "black", shape = 20, size = 0.5, # Points color, shape and size
add = "reg.line", # Add regressin line
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
conf.int = TRUE, # Add confidence interval
cor.coef = TRUE, # Add correlation coefficient. see ?stat_cor
cor.coeff.args = list(method = "spearman", label.x = -10, label.y = 3, label.sep = "\n"),
ggtheme = theme_classic(base_size = 8)) + labs(subtitle = id)
rm(a, b, df,df_top, topZ, topC, id)
}
rm(j)
}
rm(i)Plotting the correlation between the two normalization techniques for all matching mutants
p = plotList$`Glucose-Cytoplasm` + plotList$`Glucose-Mitochondria` + plotList$`Galactose-Cytoplasm` +
plotList$`Galactose-Mitochondria` + plotList$`Glycerol-Cytoplasm` + plotList$`Glycerol-Mitochondria` +
plot_layout(nrow = 3, ncol = 2)
ggsave(filename = paste0(path, "analysis/normalization/Correlation_among_normalizatin_methods_All.pdf"),
plot = p, width = 7, height = 7)
pPlotting the correlation between the two normalization techniques using ONLY the top hits 5% (high/low) roGFP2 ratios from either normalization method
p = plotListTop$`Glucose-Cytoplasm` + plotListTop$`Glucose-Mitochondria` + plotListTop$`Galactose-Cytoplasm` +
plotListTop$`Galactose-Mitochondria` + plotListTop$`Glycerol-Cytoplasm` + plotListTop$`Glycerol-Mitochondria` +
plot_layout(nrow = 3, ncol = 2)
ggsave(filename = paste0(path, "analysis/normalization/Correlation_among_normalizatin_methods_TopHits.pdf"),
plot = p, width = 7, height = 7)
pSample similarity - Dimensionality reduction
Next, we apply the dimensionality reduction method Multidimesional scaling (MDS) on the roGFP2 ratio values normalized by the plate control and Robust Z normalized to identify the major grouping of the yeast mutants based on their redox status across organelle and nutrient conditions.
NOTE: I am also including the plots for Robust Z normalized simply to highlight that for this analysis, using Robust Z normalized is NOT appropriate.
We distinctly see from plate control normalized data that the redox status is different between the organelles (higher in mitochondria compared to cytoplasm) and within the organelles the mutants group by the nutrient conditions.
getMDSdata = function(dat) {
mds = cmdscale(dist(t(dat)), eig = TRUE, k = 2)$points
colnames(mds) = c("MDS1", "MDS2")
col_anno = do.call("rbind", strsplit(rownames(mds), "_"))
col_anno = col_anno[, -c(1:3)]
colnames(col_anno) = c("Compartment", "Nutrient")
mds = data.frame(mds, col_anno)
}
mdsZ = getMDSdata(dat = normDat.Znorm$redox_replicates)
mdsC = getMDSdata(dat = normDat.ctrNorm$redox_replicates)
p1 = ggplot(mdsZ, aes(x = MDS1, y = MDS2)) + theme_classic(base_size = 8) + labs(subtitle = "Robust Z normalized") +
geom_point(aes(color = Nutrient, shape = Compartment), size = 2) + scale_color_manual(values = c("#1B9E77",
"#D95F02", "#7570B3"))
p2 = ggplot(mdsC, aes(x = MDS1, y = MDS2)) + theme_classic(base_size = 8) + labs(subtitle = "Plate control normalized") +
geom_point(aes(color = Nutrient, shape = Compartment), size = 2) + scale_color_manual(values = c("#1B9E77",
"#D95F02", "#7570B3"))
pA = p1 + p2 + plot_layout(guides = "collect")
p3 <- ggviolin(normDat.Znorm$redox_table, x = "Organelle", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Organelle",
facet.by = "Nutrient", color = "grey90") + stat_compare_means(label = "p.format",
method = "wilcox", cex = 2) + labs(subtitle = "Robust Z normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
scale_fill_manual(values = c("#E7298A", "#66A61E"))
p4 <- ggviolin(normDat.ctrNorm$redox_table, x = "Organelle", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Organelle",
facet.by = "Nutrient", color = "grey90") + stat_compare_means(label = "p.format",
method = "wilcox", cex = 2) + labs(subtitle = "Plate control normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
scale_fill_manual(values = c("#E7298A", "#66A61E"))
pB = p3 + p4 + plot_layout(guides = "collect")
p5 = ggviolin(normDat.Znorm$redox_table, x = "Nutrient", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Nutrient",
facet.by = "Organelle", color = "grey90") + stat_compare_means(method = "anova",
label.y = 22, label.x = 1.5, cex = 2) + stat_compare_means(label = "p.signif",
method = "wilcox", ref.group = ".all.", hide.ns = TRUE) + scale_fill_manual(values = c("#D95F02",
"#1B9E77", "#7570B3")) + labs(subtitle = "Robust Z normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank())
p6 = ggviolin(normDat.ctrNorm$redox_table, x = "Nutrient", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Nutrient",
facet.by = "Organelle", color = "grey90") + stat_compare_means(method = "anova",
label.y = 5, label.x = 1.5, cex = 2) + stat_compare_means(label = "p.signif",
method = "wilcox", ref.group = ".all.", hide.ns = TRUE) + scale_fill_manual(values = c("#D95F02",
"#1B9E77", "#7570B3")) + labs(subtitle = "Plate control normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank())
pC = p5 + p6 + plot_layout(guides = "collect")
p = pA/pB/pC
ggsave(filename = paste0(path, "analysis/normalization/roGFP2_ratio_comparison_nutrient_compartments.pdf"),
plot = p, width = 7, height = 7)
pReplicate similarity - Correlation
Below we show the correlation among replicates with plate control based normalization.
c1 = cor(normDat.ctrNorm$redox_replicates, method = "spearman", use = "pairwise.complete.obs")
col_anno = do.call("rbind", strsplit(colnames(c1), "_"))
col_anno = col_anno[, -c(1:3)]
colnames(col_anno) = c("Compartment", "Nutrient")
rownames(col_anno) = colnames(c1)
col_anno = data.frame(col_anno, stringsAsFactors = F)
colr = list(Compartment = c(Mitochondria = "#E7298A", Cytoplasm = "#66A61E"), Nutrient = c(Glucose = "#D95F02",
Galactose = "#1B9E77", Glycerol = "#7570B3"))
c1[c1 == 1] = NA
pheatmap(c1, clustering_method = "ward.D2", color = brewer.pal(n = 9, "Greys"), show_rownames = F,
show_colnames = F, border_color = "white", na_col = "white", annotation_row = col_anno,
annotation_col = col_anno, annotation_colors = colr, fontsize = 8, main = "Control normalize")pheatmap(c1, clustering_method = "ward.D2", color = brewer.pal(n = 9, "Greys"), show_rownames = F,
show_colnames = F, border_color = "white", na_col = "white", annotation_row = col_anno,
annotation_col = col_anno, annotation_colors = colr, fontsize = 8, filename = paste0(path,
"analysis/normalization/replicate_correlation_normCtr.pdf"), width = 5, height = 5)Distribution of normalized data
- For per plate normalization based on plate specific control and an interactive data table to access the data
p = ggplot(normDat.ctrNorm$redox_table) + theme_bw(base_size = 8) +
#labs(y = "median roGFP2 ratio (in log scale)") +
geom_boxplot(aes(x = Plate, y = Median_roGFP2_ratio), outlier.size = 0.1, lwd=0.2) + #ylim(-10, 10) +
#scale_y_continuous(trans='log10') +
facet_grid(Nutrient ~ Organelle) +
#geom_text_repel(data = subset(normDat.ctrNorm$redox_table, abs(Median_roGFP2_ratio) > 3), aes(x = Plate, y = Median_roGFP2_ratio, label = Genes), size = 2) +
theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5))
ggsave(filename = paste0(path,"analysis/normalization/roGFP2_ratio_distribution_PlateControl_normalized.pdf"),
plot = p, width = 10, height = 15)
ptmp = normDat.ctrNorm$redox_table
tmp[,5:9] = round(tmp[,5:9],3)
datatable(tmp, rownames = FALSE, filter="top", class="compact",
extensions = c('Buttons') ,
options = list(autoWidth = TRUE,
dom = 'Bfrtip',
buttons = c('csv', 'excel')
))- For per plate normalization based on median organelle specific values and an interactive data table to access the data
p = ggplot(normDat.Znorm$redox_table) + theme_bw(base_size = 9) + #geom_hline(yintercept = c(-5,5)) +
geom_boxplot(aes(x = Plate, y = Median_roGFP2_ratio), outlier.size = 0.1, lwd = 0.2) + #ylim(-10, 10) +
facet_grid(Nutrient ~ Organelle) +
#geom_text_repel(data = subset(normDat.Znorm$redox_table, Median_roGFP2_ratio > 5 | Median_roGFP2_ratio < -5), aes(x = Plate, y = Median_roGFP2_ratio, label = Genes), size = 2) +
theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5))
ggsave(filename = paste0(path,"analysis/normalization/roGFP2_ratio_distribution_RobustZ_normalized.pdf"),
plot = p, width = 10, height = 10)
ptmp = normDat.Znorm$redox_table
tmp[,5:9] = round(tmp[,5:9],3)
datatable(tmp, rownames = FALSE, filter="top", class="compact",
extensions = c('Buttons') ,
options = list(autoWidth = TRUE,
dom = 'Bfrtip',
buttons = c('csv', 'excel')
))Saving the data
Finally we save the normalized data as a .RDS (R data object) and .xlsx excel data file. All our follow up downstream analysis will start from these normalized data.
saveRDS(normDat.ctrNorm, paste0(path, "data/workspaces/YeastRedox_MATa_PlateControl_NormalizedData.RDS"))
saveRDS(normDat.Znorm, paste0(path, "data/workspaces/YeastRedox_MATa_RobustZ_NormalizedData.RDS"))
WriteXLS(normDat.ctrNorm$redox_table, ExcelFileName = paste0(path, "analysis/supplementary/tables/YeastRedox_MATa_PlateControl_NormalizedData.xlsx"),
AdjWidth = TRUE, BoldHeaderRow = TRUE, FreezeRow = 1)
WriteXLS(normDat.Znorm$redox_table, ExcelFileName = paste0(path, "analysis/supplementary/tables/YeastRedox_MATa_RobustZ_NormalizedData.xlsx"),
AdjWidth = TRUE, BoldHeaderRow = TRUE, FreezeRow = 1)Session information
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-2 pheatmap_1.0.12 patchwork_1.0.0 ggpubr_0.2.4
[5] magrittr_1.5 ggrepel_0.8.1 WriteXLS_5.0.0 data.table_1.12.8
[9] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.4 purrr_0.3.3
[13] readr_1.3.1 tidyr_1.0.2 tibble_2.1.3 ggplot2_3.2.1
[17] tidyverse_1.3.0 DT_0.12 rmdformats_0.3.6 knitr_1.28
loaded via a namespace (and not attached):
[1] httr_1.4.1 jsonlite_1.6.1 modelr_0.1.5 shiny_1.4.0
[5] assertthat_0.2.1 cellranger_1.1.0 yaml_2.2.1 pillar_1.4.3
[9] backports_1.1.5 lattice_0.20-38 glue_1.3.1 digest_0.6.23
[13] promises_1.1.0 ggsignif_0.6.0 rvest_0.3.5 colorspace_1.4-1
[17] htmltools_0.4.0 httpuv_1.5.5 plyr_1.8.5 pkgconfig_2.0.3
[21] broom_0.5.4 haven_2.2.0 bookdown_0.17 xtable_1.8-4
[25] scales_1.1.0 later_1.0.0 generics_0.0.2 farver_2.0.3
[29] ellipsis_0.3.0 withr_2.1.2 lazyeval_0.2.2 cli_2.0.1
[33] mime_0.9 crayon_1.3.4 readxl_1.3.1 evaluate_0.14
[37] fs_1.3.1 fansi_0.4.1 nlme_3.1-144 xml2_1.2.2
[41] tools_3.6.2 hms_0.5.3 formatR_1.7 lifecycle_0.1.0
[45] munsell_0.5.0 reprex_0.3.0 compiler_3.6.2 rlang_0.4.4
[49] grid_3.6.2 rstudioapi_0.11 htmlwidgets_1.5.1 crosstalk_1.0.0
[53] labeling_0.3 rmarkdown_2.1 gtable_0.3.0 DBI_1.1.0
[57] reshape2_1.4.3 R6_2.4.1 lubridate_1.7.4 fastmap_1.0.1
[61] stringi_1.4.5 Rcpp_1.0.3 vctrs_0.2.2 dbplyr_1.4.2
[65] tidyselect_1.0.0 xfun_0.12